12 research outputs found

    3D Representation Learning for Shape Reconstruction and Understanding

    Get PDF
    The real world we are living in is inherently composed of multiple 3D objects. However, most of the existing works in computer vision traditionally either focus on images or videos where the 3D information inevitably gets lost due to the camera projection. Traditional methods typically rely on hand-crafted algorithms and features with many constraints and geometric priors to understand the real world. However, following the trend of deep learning, there has been an exponential growth in the number of research works based on deep neural networks to learn 3D representations for complex shapes and scenes, which lead to many cutting-edged applications in augmented reality (AR), virtual reality (VR) and robotics as one of the most important directions for computer vision and computer graphics. This thesis aims to build an intelligent system with dynamic 3D representations that can change over time to understand and recover the real world with semantic, instance and geometric information and eventually bridge the gap between the real world and the digital world. As the first step towards the challenges, this thesis explores both explicit representations and implicit representations by explicitly addressing the existing open problems in these areas. This thesis starts from neural implicit representation learning on 3D scene representation learning and understanding and moves to a parametric model based explicit 3D reconstruction method. Extensive experimentation over various benchmarks on various domains demonstrates the superiority of our method against previous state-of-the-art approaches, enabling many applications in the real world. Based on the proposed methods and current observations of open problems, this thesis finally presents a comprehensive conclusion with potential future research directions

    TT‾T\overline{T} deformation in SCFTs and integrable supersymmetric theories

    Full text link
    We calculate the S\mathcal{S}-multiplets for two-dimensional Euclidean N=(0,2)\mathcal{N}=(0,2) and N=(2,2)\mathcal{N} = (2,2) superconformal field theories under the TT‾T\overline{T} deformation at leading order of perturbation theory in the deformation coupling. Then, from these N=(0,2)\mathcal{N} = (0, 2) deformed multiplets, we calculate two- and three-point correlators. We show the N=(0,2)\mathcal{N} = (0,2) chiral ring's elements do not flow under the TT‾T\overline{T} deformation. For the case of N=(2,2)\mathcal{N} = (2,2), we show the twisted chiral ring and chiral ring cease to exist simultaneously. Specializing to integrable supersymmetric seed theories, such as N=(2,2)\mathcal{N} = (2,2) Landau-Ginzburg models, we use the thermodynamic Bethe ansatz to study the S-matrices and ground state energies. From both an S-matrix perspective and Melzer's folding prescription, we show that the deformed ground state energy obeys the inviscid Burgers' equation. Finally, we show that several indices independent of DD-term perturbations including the Witten index, Cecotti-Fendley-Intriligator-Vafa index and elliptic genus do not flow under the TT‾T\overline{T} deformation.Comment: 46 page

    SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark

    Full text link
    In this paper, we present SignAvatars, the first large-scale multi-prompt 3D sign language (SL) motion dataset designed to bridge the communication gap for hearing-impaired individuals. While there has been an exponentially growing number of research regarding digital communication, the majority of existing communication technologies primarily cater to spoken or written languages, instead of SL, the essential communication method for hearing-impaired communities. Existing SL datasets, dictionaries, and sign language production (SLP) methods are typically limited to 2D as the annotating 3D models and avatars for SL is usually an entirely manual and labor-intensive process conducted by SL experts, often resulting in unnatural avatars. In response to these challenges, we compile and curate the SignAvatars dataset, which comprises 70,000 videos from 153 signers, totaling 8.34 million frames, covering both isolated signs and continuous, co-articulated signs, with multiple prompts including HamNoSys, spoken language, and words. To yield 3D holistic annotations, including meshes and biomechanically-valid poses of body, hands, and face, as well as 2D and 3D keypoints, we introduce an automated annotation pipeline operating on our large corpus of SL videos. SignAvatars facilitates various tasks such as 3D sign language recognition (SLR) and the novel 3D SL production (SLP) from diverse inputs like text scripts, individual words, and HamNoSys notation. Hence, to evaluate the potential of SignAvatars, we further propose a unified benchmark of 3D SL holistic motion production. We believe that this work is a significant step forward towards bringing the digital world to the hearing-impaired communities. Our project page is at https://signavatars.github.io/Comment: 9 pages; Project page available at https://signavatars.github.io

    Decomposed Human Motion Prior for Video Pose Estimation via Adversarial Training

    Full text link
    Estimating human pose from video is a task that receives considerable attention due to its applicability in numerous 3D fields. The complexity of prior knowledge of human body movements poses a challenge to neural network models in the task of regressing keypoints. In this paper, we address this problem by incorporating motion prior in an adversarial way. Different from previous methods, we propose to decompose holistic motion prior to joint motion prior, making it easier for neural networks to learn from prior knowledge thereby boosting the performance on the task. We also utilize a novel regularization loss to balance accuracy and smoothness introduced by motion prior. Our method achieves 9\% lower PA-MPJPE and 29\% lower acceleration error than previous methods tested on 3DPW. The estimator proves its robustness by achieving impressive performance on in-the-wild dataset

    U3DS3^3: Unsupervised 3D Semantic Scene Segmentation

    Full text link
    Contemporary point cloud segmentation approaches largely rely on richly annotated 3D training data. However, it is both time-consuming and challenging to obtain consistently accurate annotations for such 3D scene data. Moreover, there is still a lack of investigation into fully unsupervised scene segmentation for point clouds, especially for holistic 3D scenes. This paper presents U3DS3^3, as a step towards completely unsupervised point cloud segmentation for any holistic 3D scenes. To achieve this, U3DS3^3 leverages a generalized unsupervised segmentation method for both object and background across both indoor and outdoor static 3D point clouds with no requirement for model pre-training, by leveraging only the inherent information of the point cloud to achieve full 3D scene segmentation. The initial step of our proposed approach involves generating superpoints based on the geometric characteristics of each scene. Subsequently, it undergoes a learning process through a spatial clustering-based methodology, followed by iterative training using pseudo-labels generated in accordance with the cluster centroids. Moreover, by leveraging the invariance and equivariance of the volumetric representations, we apply the geometric transformation on voxelized features to provide two sets of descriptors for robust representation learning. Finally, our evaluation provides state-of-the-art results on the ScanNet and SemanticKITTI, and competitive results on the S3DIS, benchmark datasets.Comment: 10 Pages, 4 figures, accepted to IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 202

    The Hitchhiker's Guide to 4d N=2\mathcal{N}=2 Superconformal Field Theories

    Full text link
    Superconformal field theory with N=2\mathcal{N}=2 supersymmetry in four dimensional spacetime provides a prime playground to study strongly coupled phenomena in quantum field theory. Its rigid structure ensures valuable analytic control over non-perturbative effects, yet the theory is still flexible enough to incorporate a large landscape of quantum systems. Here we aim to offer a guidebook to fundamental features of the 4d N=2\mathcal{N}=2 superconformal field theories and basic tools to construct them in string/M-/F-theory. The content is based on a series of lectures at the Quantum Field Theories and Geometry School (https://sites.google.com/view/qftandgeometrysummerschool/home) in July 2020.Comment: v3: Improved discussion, fixed typos, added references v2: Typos fixed and added references. v1: 96 pages. Based on a series of lectures at the Quantum Field Theories and Geometry School in July 202

    U3DS3 : Unsupervised 3D Semantic Scene Segmentation

    No full text
    Contemporary point cloud segmentation approaches largely rely on richly annotated 3D training data. However , it is both time-consuming and challenging to obtain consistently accurate annotations for such 3D scene data. Moreover, there is still a lack of investigation into fully un-supervised scene segmentation for point clouds, especially for holistic 3D scenes. This paper presents U3DS 3 , as a step towards completely unsupervised point cloud segmen-tation for any holistic 3D scenes. To achieve this, U3DS 3 leverages a generalized unsupervised segmentation method for both object and background across both indoor and outdoor static 3D point clouds with no requirement for model pre-training, by leveraging only the inherent information of the point cloud to achieve full 3D scene segmentation. The initial step of our proposed approach involves generating superpoints based on the geometric characteristics of each scene. Subsequently, it undergoes a learning process through a spatial clustering-based methodology, followed by iterative training using pseudo-labels generated in accordance with the cluster centroids. Moreover, by leverag-ing the invariance and equivariance of the volumetric representations , we apply the geometric transformation on vox-elized features to provide two sets of descriptors for robust representation learning. Finally, our evaluation provides state-of-the-art results on the ScanNet and SemanticKITTI, and competitive results on the S3DIS, benchmark datasets

    P2-net: Joint description and detection of local features for pixel and point matching

    Get PDF
    Accurately describing and detecting 2D and 3D keypoints is crucial to establishing correspondences across images and point clouds. Despite a plethora of learning-based 2D or 3D local feature descriptors and detectors having been proposed, the derivation of a shared descriptor and joint keypoint detector that directly matches pixels and points remains under-explored by the community. This work takes the initiative to establish fine-grained correspondences between 2D images and 3D point clouds. In order to directly match pixels and points, a dual fully convolutional framework is presented that maps 2D and 3D inputs into a shared latent representation space to simultaneously describe and detect keypoints. Furthermore, an ultra-wide reception mechanism in combination with a novel loss function are designed to mitigate the intrinsic information variations between pixel and point local regions. Extensive experimental results demonstrate that our framework shows competitive performance in fine-grained matching between images and point clouds and achieves state-of-the-art results for the task of indoor visual localization. Our source code will be available at [no-name-for-blind-review].Comment: ICCV 202
    corecore